{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c81efda5",
   "metadata": {},
   "source": [
    "# Retail sales\n",
    "\n",
    "In this notebook we will prepare and store the retail sales dataset found [here](https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv).\n",
    "\n",
    "**Description of data:**\n",
    "\n",
    "The timeseries is collected between January 1992 and May 2016. It consists of a single series of monthly values representing sales volumes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "888749e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from statsmodels.tsa.seasonal import STL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25cc2a1f",
   "metadata": {},
   "source": [
    "# Get the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73ac5d57",
   "metadata": {},
   "source": [
    "The dataset can be obtained from this [link](https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv). It will open a raw file in GitHub. A simple way of obtaining the data is to copy and paste the values from your browser into a text editor of your choice. \n",
    "Save it in the Datasets directory, which is found at the root of this project, with the filename `example_retail_sales.csv`. \n",
    "\n",
    "Alternatively, download it using Pandas by running:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "15c6a149",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv\"\n",
    "df = pd.read_csv(url)\n",
    "df.to_csv(\"../Datasets/example_retail_sales.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5feac9ec",
   "metadata": {},
   "source": [
    "Now follow the rest of the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "707768c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\n",
    "    \"../Datasets/example_retail_sales.csv\",\n",
    "    parse_dates=[\"ds\"],\n",
    "    index_col=[\"ds\"],\n",
    "    nrows=160,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "three-blind",
   "metadata": {},
   "source": [
    "# Create dataset with missing data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "112f9b90",
   "metadata": {},
   "outputs": [],
   "source": [
    "# copy dataframe\n",
    "df_with_missing_data = df.copy()\n",
    "\n",
    "# Insert missing data into dataframe\n",
    "df_with_missing_data.iloc[10:11] = np.NaN\n",
    "df_with_missing_data.iloc[25:28] = np.NaN\n",
    "df_with_missing_data.iloc[40:45] = np.NaN\n",
    "df_with_missing_data.iloc[70:94] = np.NaN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "45acce8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save dataset in Datasets directory\n",
    "df_with_missing_data.to_csv(\"../Datasets/example_retail_sales_with_missing_data.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80293d1b",
   "metadata": {},
   "source": [
    "# Create dataset with outliers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b78e8d57",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_with_outliers = df.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "57bf7198",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Insert outliers into dataframe\n",
    "outlier_idx = [20, 33, 66, 150]\n",
    "df_with_outliers.iloc[outlier_idx] = df_with_outliers.iloc[outlier_idx] * 1.7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ce560e64",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save dataset in Datasets directory\n",
    "df_with_outliers.to_csv(\"../Datasets/example_retail_sales_with_outliers.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41606a6b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "fets",
   "language": "python",
   "name": "fets"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
